163 research outputs found
NAS-VAD: Neural Architecture Search for Voice Activity Detection
Various neural network-based approaches have been proposed for more robust
and accurate voice activity detection (VAD). Manual design of such neural
architectures is an error-prone and time-consuming process, which prompted the
development of neural architecture search (NAS) that automatically design and
optimize network architectures. While NAS has been successfully applied to
improve performance in a variety of tasks, it has not yet been exploited in the
VAD domain. In this paper, we present the first work that utilizes NAS
approaches on the VAD task. To effectively search architectures for the VAD
task, we propose a modified macro structure and a new search space with a much
broader range of operations that includes attention operations. The results
show that the network structures found by the propose NAS framework outperform
previous manually designed state-of-the-art VAD models in various noise-added
and real-world-recorded datasets. We also show that the architectures searched
on a particular dataset achieve improved generalization performance on unseen
audio datasets. Our code and models are available at
https://github.com/daniel03c1/NAS_VAD.Comment: Submitted to Interspeech 202
Neural Residual Flow Fields for Efficient Video Representations
Neural fields have emerged as a powerful paradigm for representing various
signals, including videos. However, research on improving the parameter
efficiency of neural fields is still in its early stages. Even though neural
fields that map coordinates to colors can be used to encode video signals, this
scheme does not exploit the spatial and temporal redundancy of video signals.
Inspired by standard video compression algorithms, we propose a neural field
architecture for representing and compressing videos that deliberately removes
data redundancy through the use of motion information across video frames.
Maintaining motion information, which is typically smoother and less complex
than color signals, requires a far fewer number of parameters. Furthermore,
reusing color values through motion information further improves the network
parameter efficiency. In addition, we suggest using more than one reference
frame for video frame reconstruction and separate networks, one for optical
flows and the other for residuals. Experimental results have shown that the
proposed method outperforms the baseline methods by a significant margin. The
code is available in https://github.com/daniel03c1/eff_video_representationComment: Accepted for ACCV 2022, codes are available at
https://github.com/daniel03c1/eff_video_representatio
Understanding Contrastive Learning Through the Lens of Margins
Contrastive learning, along with its variations, has been a highly effective
self-supervised learning method across diverse domains. Contrastive learning
measures the distance between representations using cosine similarity and uses
cross-entropy for representation learning. Within the same framework of
cosine-similarity-based representation learning, margins have played a
significant role in enhancing face and speaker recognition tasks.
Interestingly, despite the shared reliance on the same similarity metrics and
objective functions, contrastive learning has not actively adopted margins.
Furthermore, decision-boundary-based explanations are the only ones that have
been used to explain the effect of margins in contrastive learning. In this
work, we propose a new perspective to understand the role of margins based on
gradient analysis. Based on the new perspective, we analyze how margins affect
gradients of contrastive learning and separate the effect into more elemental
levels. We separately analyze each and provide possible directions for
improving contrastive learning. Our experimental results demonstrate that
emphasizing positive samples and scaling gradients depending on positive sample
angles and logits are the keys to improving the generalization performance of
contrastive learning in both seen and unseen datasets, and other factors can
only marginally improve performance
Recommended from our members
How Do Foreign Accents Impact Perception and Credibility?
The paper aims to investigate how foreign accents impact perception and credibility by looking at various experiments that the researchers have conducted. To observe the effects that foreign accents have on listeners, we outlined three critical areas: visual and auditory stimuli, subtitle comprehension, and perception. By having an in-group or native accent as our control group, we were able to evaluate how various accents, such as Dutch and German, have a subtle impact on the accuracy of the speakers rated and measured by the participants. Based on our analysis, foreign-accented speakers are perceived to be less credible. In addition, it was concluded that perception also plays a key role in the day-to-day life of non-native speakers. While more research would be beneficial, it is clear that foreign accents reduce the speakers’ credibility and should be considered in environments such as job interviews and other social settings
Hexa: Self-Improving for Knowledge-Grounded Dialogue System
A common practice in knowledge-grounded dialogue generation is to explicitly
utilize intermediate steps (e.g., web-search, memory retrieval) with modular
approaches. However, data for such steps are often inaccessible compared to
those of dialogue responses as they are unobservable in an ordinary dialogue.
To fill in the absence of these data, we develop a self-improving method to
improve the generative performances of intermediate steps without the ground
truth data. In particular, we propose a novel bootstrapping scheme with a
guided prompt and a modified loss function to enhance the diversity of
appropriate self-generated responses. Through experiments on various benchmark
datasets, we empirically demonstrate that our method successfully leverages a
self-improving mechanism in generating intermediate and final responses and
improves the performances on the task of knowledge-grounded dialogue
generation
Highly Clumpy Structure of the Thermal Composite Supernova Remnant 3C391 Unveiled by Chandra
The nature of the internal thermal X-ray emission seen in ``thermal
composite" supernova remnants is still uncertain. Chandra observation of the
3C391 shows a southeast-northwest elongated morphology and unveils a highly
clumpy structure of the remnant. Detailed spatially resolved spectral analysis
for the small-scale features reveals normal metal abundance and uniform
temperature for the interior gas. The properties of the hot gas comparatively
favor the cloudlet evaporation model as a main mechanism for the ``thermal
composite" X-ray appearance, though radiative rim and thermal conduction may
also be effective. A faint protrusion is found in Si and S lines out of the
southwest radio border.Comment: 7 pages, 4 embedded figures, in COSPAR 2004 session E1.4, "Young
Neutron Stars and Supernova Remnants", Advances in Space Research, in pres
Prospects for Pentaquark Production at Meson Factories
Following Rosner [hep-ph/0312269], we consider B-decay production channels
for the exotic I=0 and pentaquarks that have been recently reported. We
also discuss new search channels for isovector pentaquarks, such as the
, that are generically present in chiral soliton
models but were not observed in recent experiments. Futhermore, we argue that
weak decays of charmed baryons, such as the and ,
provide another clean way of detecting exotic baryons made of light quarks
only. We also discuss discovery channels for charmed pentaquarks, such as the
isosinglet , in weak decays of bottom mesons and
baryons. Finally, we discuss prospects for inclusive production of pentaquarks
in collisions, with associated production of particles carrying the
opposite baryon number.Comment: 15 pages, LaTeX; v2,v3: minor corrections, references added; v4:
minor modifications, the version published in Physics Letters
Interstellar Silicate Dust in the z=0.89 Absorber Towards PKS 1830-211: Crystalline Silicates at High Redshift?
We present evidence of a >10-sigma detection of the 10 micron silicate dust
absorption feature in the spectrum of the gravitationally lensed quasar PKS
1830-211, produced by a foreground absorption system at redshift 0.886. We have
examined more than 100 optical depth templates, derived from both observations
of Galactic and extragalactic sources and laboratory measurements, in order to
constrain the chemical structure of the silicate dust. We find that the best
fit to the observed absorption profile is produced by laboratory crystalline
olivine, with a corresponding peak optical depth of tau_10=0.27+/-0.05. The fit
is slightly improved upon by including small contributions from additional
materials such as silica, enstatite, or serpentine, which suggests that the
dust composition may consist of a blend of crystalline silicates. Combining
templates for amorphous and crystalline silicates, we find that the fraction of
crystalline silicates needs to be at least 95%. Given the rarity of
extragalactic sources with such a high degree of silicate crystallinity, we
also explore the possibility that the observed spectral features are produced
by amorphous silicates in combination with other molecular or atomic
transitions, or by foreground source contamination. While we cannot rule out
these latter possibilities, they lead to much poorer profile fits than for the
crystalline olivine templates. If the presence of crystalline interstellar
silicates in this distant galaxy is real, it would be highly unusual, given
that the Milky Way interstellar matter contains essentially only amorphous
silicates. It is possible that the z=0.886 absorber towards PKS 1830-211, well
known for its high molecular content, has a unique star-forming environment
that enables crystalline silicates to form and prevail.Comment: 67 pages, 21 figures, accepted for publication in the Astrophysical
Journa
- …